Annotating Errors in Student Texts: First Experiences and Experiments
نویسندگان
چکیده
We describe the creation of an annotation layer for word-based writing errors for a corpus of student writings. The texts are written in Swedish by students between 9 and 19 years old. Our main purpose is to identify errors regarding spelling, split compounds and merged words. In addition, we also identify simple word-based grammatical errors, including morphological errors and extra words. In this paper we describe the corpus and the annotation process, including detailed descriptions of the error types and guidelines. We find that we can perform this annotation with a substantial inter-annotator agreement, but that there are still some remaining issues with the annotation. We also report results on two pilot experiments regarding spelling correction and the consistency of downstream NLP tools, to exemplify the usefulness of the annotated corpus.
منابع مشابه
Explanation of Residents' Experiences Concerning Medication Errors in Neonatal Intensive Care Units: A Qualitative Study
Introduction: Medication errors are a potentially hazardous accident for the patients and can be used as a measure of patient safety in the healthcare system. Neonates are the most vulnerable population because of their body size. The experiences and views of those involved in the healthcare system can be a significant source of information gathering and planning in preventing medication errors...
متن کاملInvestigation of different types of nursing errors based on their lived and working experiences in health centers; A qualitative study
Introduction: The occurrence of human error is inevitable, and the health area and the nurses are no exception.Considering the fact that nursing service error is a harmful phenomenon and in some cases irrecoverable, therefore, identification the types of nursing errors in order to reduce them and improve patient safety is vital. Methods: This research was performed qualitatively and through a d...
متن کاملCorpus building for Mongolian language
This paper presents an ongoing research aimed to build the first corpus, 5 million words, for Mongolian language by focusing on annotating and tagging corpus texts according to TEI XML (McQueen, 2004) format. Also, a tool, MCBuilder, which provides support for flexibly and manually annotating and manipulating the corpus texts with XML structure, is presented.
متن کاملDetecting Code-Switching in a Multilingual Alpine Heritage Corpus
This paper describes experiments in detecting and annotating code-switching in a large multilingual diachronic corpus of Swiss Alpine texts. The texts are in English, French, German, Italian, Romansh and Swiss German. Because of the multilingual authors (mountaineers, scientists) and the assumed multilingual readers, the texts contain numerous code-switching elements. When building and annotati...
متن کاملAnnotating Article Errors in Spanish Learner Texts: Design and Evaluation of an Annotation Scheme
Annotating a corpus with error information is a challenging task. This paper describes the design, evaluation and refinement of an annotation scheme for Spanish article errors in learner data, so that future work on corpus annotation and automatic article error detection can progress. To evaluate reliability, 300 noun phrases with definite, indefinite and zero article have been tagged by four a...
متن کامل